Correction Annotation for Non-Native Arabic Texts: Guidelines and Corpus

نویسندگان

  • Wajdi Zaghouani
  • Nizar Habash
  • Houda Bouamor
  • Alla Rozovskaya
  • Behrang Mohit
  • Abeer Heider
  • Kemal Oflazer
چکیده

We present our correction annotation guidelines to create a manually corrected nonnative (L2) Arabic corpus. We develop our approach by extending an L1 large-scale Arabic corpus and its manual corrections, to include manually corrected non-native Arabic learner essays. Our overarching goal is to use the annotated corpus to develop components for automatic detection and correction of language errors that can be used to help Standard Arabic learners (native and non-native) improve the quality of the Arabic text they produce. The created corpus of L2 text manual corrections is the largest to date. We evaluate our guidelines using inter-annotator agreement and show a high degree of consistency.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large Scale Arabic Error Annotation: Guidelines and Framework

We present annotation guidelines and a web-based annotation framework developed as part of an effort to create a manually annotated Arabic corpus of errors and corrections for various text types. Such a corpus will be invaluable for developing Arabic error correction tools, both for training models and as a gold standard for evaluating error correction algorithms. We summarize the guidelines we...

متن کامل

The ASK Corpus - a Language Learner Corpus of Norwegian as a Second Language

In our paper we present the design and interface of ASK, a language learner corpus of Norwegian as a second language which contains essays collected from language tests on two different proficiency levels as well as personal data from the test takers. In addition, the corpus also contains texts and relevant personal data from native Norwegians as control data. The texts as well as the personal ...

متن کامل

Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation

We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic. Our overarching goal is to use the annotated corpus to develop automatic machine translation post-editing systems for Arabic that can be used to help accelerate the human revision process of translated texts. The creation of any manually annotated ...

متن کامل

The Second QALB Shared Task on Automatic Text Correction for Arabic

We present a summary of QALB-2015, the second shared task on automatic text correction of Arabic texts. The shared task extends QALB-2014, which focused on correcting errors in Arabic texts produced by native speakers of Arabic. The competition this year, in addition to native data, includes texts produced by learners of Arabic as a foreign language. The report includes an overview of the QALB ...

متن کامل

Syntactic Annotation Guidelines for the Quranic Arabic Dependency Treebank

The Quranic Arabic Dependency Treebank (QADT) is part of the Quranic Arabic Corpus (http://corpus.quran.com), an online linguistic resource organized by the University of Leeds, and developed through online collaborative annotation. The website has become a popular study resource for Arabic and the Quran, and is now used by over 1,500 researchers and students daily. This paper presents the tree...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015